Research Question

How do units of housing supply completed in the last 12 years compare across regions of England when adjusting by regional populations?

Given the current housing crisis and shortage in the UK, I became interested in how different regions of England particularly are performing with their supply of new homes. I had a look into figures on gov.uk and found data on net additional dwellings completed, defined as “the absolute change in stock between one year and the next, including losses and gains from new builds, conversions, changes of use (for example a residential house to an office) and demolitions”.

Given that bigger regions would have higher new housing figures, I found data on the population of regions by year from the Office for National Statistics (ONS) to create a per 1000 people figure for a given region at a given year. This would allow easier comparisons across regions.

I decided on a line plot with “Year” on the X axis, and “New Homes completed per 1000”, with 9 lines to represent each of the 9 English regions, displaying the trends over time. I also wanted an interactive element where users could compare, isolate, and zoom into areas and years of interest.

Data Preparation

Setting up & loading packages

The code below is the setup with the required libraries. Renv was used for package management, with package versions used listed in /renv.lock in the repository.

# Package version management: using "renv" package to ensure the packages used in this project are preserved if/when cloning
if(!require('renv')) install.packages('renv')
library(renv)
# Restores the environment from renv.lock file
renv::restore()
## - The library is already synchronized with the lockfile.
# My library:
# here determines the project root directory automatically and constructs file paths relative to that root, not the current working directory
library(here)
# readODS package needed to read the Net Additional Dwellings .ods dataset
library(readODS)
# readxl package needed to read the Regional Population .xlsx dataset
library(readxl)
# Library required for data wrangling: mainly using tidyr and dplyr from tidyverse
library(tidyverse)
# ggplot2 allows for plotting with ggplot
library(ggplot2)
# plotly allows interactivity from ggplot plots. I like the use of plot zooming and custom hover text which is why plotly was used
library(plotly)

Importing data

We will be merging these data sets so we get a figure of the number of new housing units relative to that region’s population for a given year, by 1000 people, to allow regional comparisons. This means columns that need to be merged and therefore need to match by names are Region and Year.

I had a look at the data being used in Excel and as a tibble using View() after importing, so already have an idea of what needs to be edited.

# here constructing file paths relative to the project root
# If downloading the data from sources, please ensure relevant data folder and file names below match
file_path_homes <- here("data", "raw_netadditionaldwellings.ods")
file_path_pop <- here("data", "raw_population.xlsx")

# Read data from the files
data_homes <- read_ods(file_path_homes, sheet = 5) #unrounded sheet for more accurate figures
data_pop <- read_xlsx(file_path_pop, sheet = 11) #mid-year population estimates 2011-2023

# See new imported data sets are correct and review what edits need to be made.
# Best to View in R initially
# Data needs to match by Year and Region, therefore need to ensure relevant wrangling is completed...

Wrangle (Part 1/3): Net Additional Dwellings data

# We are looking for regions of England, from 2011 to 2023 as no pre-2011 population data was available unfortunately
# As mentioned, ideally you'd View(data_homes) in R at this stage
# We need to remove blank columns and rows, remove the England column, rename Year column data to just numbers, remove pre-2011 data, and convert the regions to long

#remove 1st 2 rows as they are irrelevant 
data_homes <- data_homes[-1:-2, ]

#make first row the headings for columns, then delete first row
colnames(data_homes) <- data_homes[1, ]
data_homes <- data_homes[-1, ]

# Check if that looks right
head(data_homes) 
## # A tibble: 6 × 12
##   Components of net hou…¹ Year  `North East` `North West` Yorkshire and The Hu…²
##   <chr>                   <chr> <chr>        <chr>        <chr>                 
## 1 Net additions [note 1]  2000… 2890         10720        10800                 
## 2 Net additions [note 1]  2001… 4489         13543        12752                 
## 3 Net additions [note 1]  2002… 5343         18120        13452                 
## 4 Net additions [note 1]  2003… 5231         21853        16252                 
## 5 Net additions [note 1]  2004… 6972         21411        15024                 
## 6 Net additions [note 1]  2005… 6927         23832        18684                 
## # ℹ abbreviated names: ¹​`Components of net housing supply`,
## #   ²​`Yorkshire and The Humber`
## # ℹ 7 more variables: `East Midlands` <chr>, `West Midlands` <chr>,
## #   `East of England` <chr>, London <chr>, `South East` <chr>,
## #   `South West` <chr>, England <chr>
# Delete "England" column as we're just seeing regional differences
data_homes <- data_homes %>%
  select(-"England")

# Filtering for only Net data
data_homes <- data_homes %>% filter(
  str_starts(`Components of net housing supply`, "Net additions")
  )

# Can now delete this "Components of net housing supply" column as have used filters
data_homes <- data_homes %>%
  select(-`Components of net housing supply`)

# Rename Year as first 4 digits: cannot convert to numeric otherwise
data_homes$Year <- str_sub(data_homes$Year, 1, 4)

# Filter to remove pre-2011 data to match the data set with the population data we have
data_homes <- data_homes %>%
  filter(Year >= 2011)

# Make long
data_homes <- pivot_longer(data_homes, 
             cols = -Year, #all columns except Year as it's already long
             names_to = "Region",
             values_to = "New_Homes")

# Quick check whether we have missing data
is.na(data_homes)
##         Year Region New_Homes
##   [1,] FALSE  FALSE     FALSE
##   [2,] FALSE  FALSE     FALSE
##   [3,] FALSE  FALSE     FALSE
##   [4,] FALSE  FALSE     FALSE
##   [5,] FALSE  FALSE     FALSE
##   [6,] FALSE  FALSE     FALSE
##   [7,] FALSE  FALSE     FALSE
##   [8,] FALSE  FALSE     FALSE
##   [9,] FALSE  FALSE     FALSE
##  [10,] FALSE  FALSE     FALSE
##  [11,] FALSE  FALSE     FALSE
##  [12,] FALSE  FALSE     FALSE
##  [13,] FALSE  FALSE     FALSE
##  [14,] FALSE  FALSE     FALSE
##  [15,] FALSE  FALSE     FALSE
##  [16,] FALSE  FALSE     FALSE
##  [17,] FALSE  FALSE     FALSE
##  [18,] FALSE  FALSE     FALSE
##  [19,] FALSE  FALSE     FALSE
##  [20,] FALSE  FALSE     FALSE
##  [21,] FALSE  FALSE     FALSE
##  [22,] FALSE  FALSE     FALSE
##  [23,] FALSE  FALSE     FALSE
##  [24,] FALSE  FALSE     FALSE
##  [25,] FALSE  FALSE     FALSE
##  [26,] FALSE  FALSE     FALSE
##  [27,] FALSE  FALSE     FALSE
##  [28,] FALSE  FALSE     FALSE
##  [29,] FALSE  FALSE     FALSE
##  [30,] FALSE  FALSE     FALSE
##  [31,] FALSE  FALSE     FALSE
##  [32,] FALSE  FALSE     FALSE
##  [33,] FALSE  FALSE     FALSE
##  [34,] FALSE  FALSE     FALSE
##  [35,] FALSE  FALSE     FALSE
##  [36,] FALSE  FALSE     FALSE
##  [37,] FALSE  FALSE     FALSE
##  [38,] FALSE  FALSE     FALSE
##  [39,] FALSE  FALSE     FALSE
##  [40,] FALSE  FALSE     FALSE
##  [41,] FALSE  FALSE     FALSE
##  [42,] FALSE  FALSE     FALSE
##  [43,] FALSE  FALSE     FALSE
##  [44,] FALSE  FALSE     FALSE
##  [45,] FALSE  FALSE     FALSE
##  [46,] FALSE  FALSE     FALSE
##  [47,] FALSE  FALSE     FALSE
##  [48,] FALSE  FALSE     FALSE
##  [49,] FALSE  FALSE     FALSE
##  [50,] FALSE  FALSE     FALSE
##  [51,] FALSE  FALSE     FALSE
##  [52,] FALSE  FALSE     FALSE
##  [53,] FALSE  FALSE     FALSE
##  [54,] FALSE  FALSE     FALSE
##  [55,] FALSE  FALSE     FALSE
##  [56,] FALSE  FALSE     FALSE
##  [57,] FALSE  FALSE     FALSE
##  [58,] FALSE  FALSE     FALSE
##  [59,] FALSE  FALSE     FALSE
##  [60,] FALSE  FALSE     FALSE
##  [61,] FALSE  FALSE     FALSE
##  [62,] FALSE  FALSE     FALSE
##  [63,] FALSE  FALSE     FALSE
##  [64,] FALSE  FALSE     FALSE
##  [65,] FALSE  FALSE     FALSE
##  [66,] FALSE  FALSE     FALSE
##  [67,] FALSE  FALSE     FALSE
##  [68,] FALSE  FALSE     FALSE
##  [69,] FALSE  FALSE     FALSE
##  [70,] FALSE  FALSE     FALSE
##  [71,] FALSE  FALSE     FALSE
##  [72,] FALSE  FALSE     FALSE
##  [73,] FALSE  FALSE     FALSE
##  [74,] FALSE  FALSE     FALSE
##  [75,] FALSE  FALSE     FALSE
##  [76,] FALSE  FALSE     FALSE
##  [77,] FALSE  FALSE     FALSE
##  [78,] FALSE  FALSE     FALSE
##  [79,] FALSE  FALSE     FALSE
##  [80,] FALSE  FALSE     FALSE
##  [81,] FALSE  FALSE     FALSE
##  [82,] FALSE  FALSE     FALSE
##  [83,] FALSE  FALSE     FALSE
##  [84,] FALSE  FALSE     FALSE
##  [85,] FALSE  FALSE     FALSE
##  [86,] FALSE  FALSE     FALSE
##  [87,] FALSE  FALSE     FALSE
##  [88,] FALSE  FALSE     FALSE
##  [89,] FALSE  FALSE     FALSE
##  [90,] FALSE  FALSE     FALSE
##  [91,] FALSE  FALSE     FALSE
##  [92,] FALSE  FALSE     FALSE
##  [93,] FALSE  FALSE     FALSE
##  [94,] FALSE  FALSE     FALSE
##  [95,] FALSE  FALSE     FALSE
##  [96,] FALSE  FALSE     FALSE
##  [97,] FALSE  FALSE     FALSE
##  [98,] FALSE  FALSE     FALSE
##  [99,] FALSE  FALSE     FALSE
## [100,] FALSE  FALSE     FALSE
## [101,] FALSE  FALSE     FALSE
## [102,] FALSE  FALSE     FALSE
## [103,] FALSE  FALSE     FALSE
## [104,] FALSE  FALSE     FALSE
## [105,] FALSE  FALSE     FALSE
## [106,] FALSE  FALSE     FALSE
## [107,] FALSE  FALSE     FALSE
## [108,] FALSE  FALSE     FALSE
## [109,] FALSE  FALSE     FALSE
## [110,] FALSE  FALSE     FALSE
## [111,] FALSE  FALSE     FALSE
## [112,] FALSE  FALSE     FALSE
## [113,] FALSE  FALSE     FALSE
## [114,] FALSE  FALSE     FALSE
## [115,] FALSE  FALSE     FALSE
## [116,] FALSE  FALSE     FALSE
## [117,] FALSE  FALSE     FALSE
#All values come back False, meaning no missing values

# Convert to numeric to allow plotting
data_homes <- data_homes %>%
  mutate(across(c(Year, New_Homes), as.numeric))

# How does it look?
print(data_homes)
## # A tibble: 117 × 3
##     Year Region                   New_Homes
##    <dbl> <chr>                        <dbl>
##  1  2011 North East                    3939
##  2  2011 North West                   10612
##  3  2011 Yorkshire and The Humber     12066
##  4  2011 East Midlands                12426
##  5  2011 West Midlands                10206
##  6  2011 East of England              18460
##  7  2011 London                       29672
##  8  2011 South East                   24835
##  9  2011 South West                   18570
## 10  2012 North East                    3589
## # ℹ 107 more rows
# Looks good!

Plot: Net Additional Dwellings data

#Quick view of our data

plot_homes <- ggplot(data = data_homes, mapping = aes(
  x = Year, y = New_Homes, 
  group = Region, colour = Region)) + 
  geom_point() + 
  geom_line() +
  labs(title = "New Homes added by Region, 2011-2023", 
         x = "Year", 
         y = "New Homes") +
    scale_x_continuous(breaks = seq(2011, 2023))

print(plot_homes)

ggsave(plot = plot_homes, "plots/plot_newhomes.png")
## Saving 7 x 5 in image

Wrangle (Part 2/3): Population data

#Check imported data
head(data_pop)
## # A tibble: 6 × 16
##   MYE4: Population estim…¹ ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10
##   <chr>                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 This worksheet contains… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 2 To turn off freeze pane… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 3 Please choose from the … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 4 This met my needs, plea… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 5 I need something slight… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 6 This is not what I need… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## # ℹ abbreviated name:
## #   ¹​`MYE4: Population estimates: Summary for England and Wales, mid-2011 to mid-2023`
## # ℹ 6 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
## #   ...15 <chr>, ...16 <chr>
#1st column, and first 6 rows can go. Will also need to rename Year headings to only numeric, and convert to long

#remove first 6 rows, no data
data_pop <- data_pop %>% slice(-1:-6)

#remove first column, no data
data_pop <- data_pop[ ,-1]

#rename headers as the first row, then remove first row
colnames(data_pop) <- data_pop[1, ]
data_pop <- data_pop[-1, ]

#rename just 1st col to Region to match New Homes data
colnames(data_pop)[colnames(data_pop) == "Name"] <- "Region"

#view only regional data
data_pop <- data_pop %>%
  filter(grepl("Region", Geography))

#remove Geography column as now not needed
data_pop <- data_pop[ ,-2]

#convert to long
data_pop <- data_pop %>%
  pivot_longer(
    cols = starts_with("Mid"),  # Select the year columns (2011, 2012, 2013)
    names_to = "Year",         # The new column for year
    values_to = "Population"   # The new column for population values
  )

# Remove 'Mid-' from the Year column
data_pop$Year <- gsub("Mid-", "", data_pop$Year)

#Change from UPPERCASE to LikeThis
library(tools)
data_pop$Region <- toTitleCase(tolower(data_pop$Region))

#Quick check whether we have missing data
is.na(data_pop)
##        Region  Year Population
##   [1,]  FALSE FALSE      FALSE
##   [2,]  FALSE FALSE      FALSE
##   [3,]  FALSE FALSE      FALSE
##   [4,]  FALSE FALSE      FALSE
##   [5,]  FALSE FALSE      FALSE
##   [6,]  FALSE FALSE      FALSE
##   [7,]  FALSE FALSE      FALSE
##   [8,]  FALSE FALSE      FALSE
##   [9,]  FALSE FALSE      FALSE
##  [10,]  FALSE FALSE      FALSE
##  [11,]  FALSE FALSE      FALSE
##  [12,]  FALSE FALSE      FALSE
##  [13,]  FALSE FALSE      FALSE
##  [14,]  FALSE FALSE      FALSE
##  [15,]  FALSE FALSE      FALSE
##  [16,]  FALSE FALSE      FALSE
##  [17,]  FALSE FALSE      FALSE
##  [18,]  FALSE FALSE      FALSE
##  [19,]  FALSE FALSE      FALSE
##  [20,]  FALSE FALSE      FALSE
##  [21,]  FALSE FALSE      FALSE
##  [22,]  FALSE FALSE      FALSE
##  [23,]  FALSE FALSE      FALSE
##  [24,]  FALSE FALSE      FALSE
##  [25,]  FALSE FALSE      FALSE
##  [26,]  FALSE FALSE      FALSE
##  [27,]  FALSE FALSE      FALSE
##  [28,]  FALSE FALSE      FALSE
##  [29,]  FALSE FALSE      FALSE
##  [30,]  FALSE FALSE      FALSE
##  [31,]  FALSE FALSE      FALSE
##  [32,]  FALSE FALSE      FALSE
##  [33,]  FALSE FALSE      FALSE
##  [34,]  FALSE FALSE      FALSE
##  [35,]  FALSE FALSE      FALSE
##  [36,]  FALSE FALSE      FALSE
##  [37,]  FALSE FALSE      FALSE
##  [38,]  FALSE FALSE      FALSE
##  [39,]  FALSE FALSE      FALSE
##  [40,]  FALSE FALSE      FALSE
##  [41,]  FALSE FALSE      FALSE
##  [42,]  FALSE FALSE      FALSE
##  [43,]  FALSE FALSE      FALSE
##  [44,]  FALSE FALSE      FALSE
##  [45,]  FALSE FALSE      FALSE
##  [46,]  FALSE FALSE      FALSE
##  [47,]  FALSE FALSE      FALSE
##  [48,]  FALSE FALSE      FALSE
##  [49,]  FALSE FALSE      FALSE
##  [50,]  FALSE FALSE      FALSE
##  [51,]  FALSE FALSE      FALSE
##  [52,]  FALSE FALSE      FALSE
##  [53,]  FALSE FALSE      FALSE
##  [54,]  FALSE FALSE      FALSE
##  [55,]  FALSE FALSE      FALSE
##  [56,]  FALSE FALSE      FALSE
##  [57,]  FALSE FALSE      FALSE
##  [58,]  FALSE FALSE      FALSE
##  [59,]  FALSE FALSE      FALSE
##  [60,]  FALSE FALSE      FALSE
##  [61,]  FALSE FALSE      FALSE
##  [62,]  FALSE FALSE      FALSE
##  [63,]  FALSE FALSE      FALSE
##  [64,]  FALSE FALSE      FALSE
##  [65,]  FALSE FALSE      FALSE
##  [66,]  FALSE FALSE      FALSE
##  [67,]  FALSE FALSE      FALSE
##  [68,]  FALSE FALSE      FALSE
##  [69,]  FALSE FALSE      FALSE
##  [70,]  FALSE FALSE      FALSE
##  [71,]  FALSE FALSE      FALSE
##  [72,]  FALSE FALSE      FALSE
##  [73,]  FALSE FALSE      FALSE
##  [74,]  FALSE FALSE      FALSE
##  [75,]  FALSE FALSE      FALSE
##  [76,]  FALSE FALSE      FALSE
##  [77,]  FALSE FALSE      FALSE
##  [78,]  FALSE FALSE      FALSE
##  [79,]  FALSE FALSE      FALSE
##  [80,]  FALSE FALSE      FALSE
##  [81,]  FALSE FALSE      FALSE
##  [82,]  FALSE FALSE      FALSE
##  [83,]  FALSE FALSE      FALSE
##  [84,]  FALSE FALSE      FALSE
##  [85,]  FALSE FALSE      FALSE
##  [86,]  FALSE FALSE      FALSE
##  [87,]  FALSE FALSE      FALSE
##  [88,]  FALSE FALSE      FALSE
##  [89,]  FALSE FALSE      FALSE
##  [90,]  FALSE FALSE      FALSE
##  [91,]  FALSE FALSE      FALSE
##  [92,]  FALSE FALSE      FALSE
##  [93,]  FALSE FALSE      FALSE
##  [94,]  FALSE FALSE      FALSE
##  [95,]  FALSE FALSE      FALSE
##  [96,]  FALSE FALSE      FALSE
##  [97,]  FALSE FALSE      FALSE
##  [98,]  FALSE FALSE      FALSE
##  [99,]  FALSE FALSE      FALSE
## [100,]  FALSE FALSE      FALSE
## [101,]  FALSE FALSE      FALSE
## [102,]  FALSE FALSE      FALSE
## [103,]  FALSE FALSE      FALSE
## [104,]  FALSE FALSE      FALSE
## [105,]  FALSE FALSE      FALSE
## [106,]  FALSE FALSE      FALSE
## [107,]  FALSE FALSE      FALSE
## [108,]  FALSE FALSE      FALSE
## [109,]  FALSE FALSE      FALSE
## [110,]  FALSE FALSE      FALSE
## [111,]  FALSE FALSE      FALSE
## [112,]  FALSE FALSE      FALSE
## [113,]  FALSE FALSE      FALSE
## [114,]  FALSE FALSE      FALSE
## [115,]  FALSE FALSE      FALSE
## [116,]  FALSE FALSE      FALSE
## [117,]  FALSE FALSE      FALSE
#All values come back FALSE, meaning no missing values

#make numeric for plot
data_pop <- data_pop %>%
  mutate(across(c(Year, Population), as.numeric))

print(data_pop)
## # A tibble: 117 × 3
##    Region      Year Population
##    <chr>      <dbl>      <dbl>
##  1 North East  2023    2711380
##  2 North East  2022    2682069
##  3 North East  2021    2647493
##  4 North East  2020    2637426
##  5 North East  2019    2636676
##  6 North East  2018    2629393
##  7 North East  2017    2623787
##  8 North East  2016    2618253
##  9 North East  2015    2611804
## 10 North East  2014    2610482
## # ℹ 107 more rows

Plot: Population

plot_pop <- ggplot(data_pop, aes(x = Year, y = Population, group = Region, colour = Region)) +
  geom_line() +
  geom_point() +
  labs(title = "Population by Region, 2011-2023", 
       x = "Year", 
       y = "Population") +
  scale_x_continuous(breaks = seq(2011, 2023))

print(plot_pop)

ggsave(plot = plot_pop, "plots/plot_population.png")
## Saving 7 x 5 in image

Wrangle (Part 3/3): Merging datasets

#Will want to combine the datasets to form a new data frame so Region and Year remain as columns, and New Homes and Population are new
#Names of Regions and Years must be the same for all data to merge correctly, lets make sure the names are matching

#Check if we have anything earlier than 2011
any(data_homes$Year < 2011)
#False, great!
any(data_pop$Year < 2011)
#False, great!

unique(data_homes$Region)
unique(data_pop$Region)
#Looks like there are some differences in Region names with East and Yorkshire... 
#I prefer "East of England" and "Yorkshire and The Humber" as names from the Dwellings df, so will rename these in the Population df
#rename "East" to just "East of England", and "Yorkshire and the Humber" to "... The Humber"
data_pop <- data_pop %>%
  mutate(
    `Region` = recode(`Region`, "East" = "East of England"),
    `Region` = recode(`Region`, "Yorkshire and the Humber" = "Yorkshire and The Humber")
  )

#We can now create a new data frame where new homes are adjusted by population for each region

#merging with Region and Year labels remaining
data_final <- merge(data_homes, data_pop, by.x = c("Region", "Year"), by.y = c("Region", "Year"))

#Check all 9 regions are here
unique(data_final$Region)
#yep
#Check all years we're interested in are accounted for
unique(data_final$Year)
#Yep
#ANy data lost in the merge, resulting in NA values?
is.na(data_final)
#All FALSE, looks good to me!

#view the data
print(data_final)

#make new columns numeric for plotting
data_final <- data_final %>%
  mutate(across(c(Year, New_Homes, Population), as.numeric))

# Calculate the Population/Houses*1000 ratio, create new column for figures
data_final$Homes_By_Pop <- ((data_final$`New_Homes` / data_final$`Population`)*1000)

#see the new column
head(data_final)
head(data_final)
##          Region Year New_Homes Population Homes_By_Pop
## 1 East Midlands 2011     12426    4537448     2.738544
## 2 East Midlands 2012     12449    4570702     2.723652
## 3 East Midlands 2013     13949    4604568     3.029383
## 4 East Midlands 2014     16857    4642629     3.630917
## 5 East Midlands 2015     18896    4681640     4.036192
## 6 East Midlands 2016     20717    4731615     4.378420

Final Plot for Assessment

# Plot it all!
plot_final <- ggplot(data_final, aes(
  x = Year, 
  y = Homes_By_Pop, 
  group = `Region`, 
  colour = `Region`, 
  text = paste(
    Region, "|", Year, 
    "<br>Population:", format(Population, big.mark = ","), 
    "<br>Completed Builds:", format(New_Homes, big.mark = ","), 
    "<br><b>Homes completed per 1000 people:</b>", round(Homes_By_Pop, 2)))) +
  geom_line() +
  geom_point(size = 2) +
  labs(title = "New Homes Completed to Population by English Region, 2011-2023", 
       x = "Year",
       y = "Homes per 1000 people",
       caption = "Data sourced from ONS and gov.uk"
       ) +
  scale_x_continuous(
    breaks = seq(2011, 2023)) +
  scale_y_continuous(
    breaks = seq(1, 5.5, by = 0.25),
      limits = c(1, 5.5)) +
  theme(
    title = element_text(family = "Arial", face = "bold", size = 14),
    axis.title.x = element_text(family = "Arial", face = "italic", size = 12),
    axis.title.y = element_text(family = "Arial", face = "italic", size = 12),
    axis.text = element_text(family = "Arial", size = 10),
    legend.title = element_text(family = "Arial", face = "bold", size = 12),
    legend.text = element_text(family = "Arial", size = 10),
    legend.key.height = unit(0.1, "cm"),  # Reduce vertical space between legend items
    legend.spacing.y = unit(0.1, "cm"),   # Additional control for vertical spacing
    panel.background = element_rect(fill = "white", color = NA), # White panel background
    plot.background = element_rect(fill = "white", color = NA), # White plot background
    axis.ticks = element_line(color = "black")) # Grey ticks

# Convert ggplot to plotly for interactivity & allow custom hover text to show
plot_final_int <- ggplotly(plot_final, tooltip = "text")

# Legend filters
plot_final_int <- plot_final_int %>% layout(
  autosize = TRUE,  # Makes the plot responsive
    width = 1000,
    height = 500,
  legend = list(
    x = 1,  # Position the legend to the right of the plot
    y = 0.5, # Vertically center the legend
    xanchor = 'left', # Align the left side of the legend with the x position
    yanchor = 'middle', # Align the middle of the legend with the y position
    orientation = 'v'  # Vertical arrangement of legend items
    )
  )
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
# Show the interactive plot
plot_final_int
#save final to plots file - animated
htmlwidgets::saveWidget(plot_final_int, "plots/nhomesper1000_interac.html")

#save the static version also
ggsave(plot = plot_final, "plots/nhomesper1000_static.png")
## Saving 7 x 5 in image

Conclusions

My initial thoughts: I was quite surprised to see how few homes were being built - I’d expected more of a noticeable increase as years and populations increased. There is the obvious dip in 2020 and lag to 2021 due to halting construction activity, and it seems we’re still in a sort of housing slog.

Limitations & Future ideas

Although my project offers an interesting insight into new housing supply differences across English regions, it of course overlooks the contextual and political factors involved in housing supply. Economic conditions (e.g., unemployment, income levels), government policies (e.g., housing subsidies, planning permissions/ regulations) and details regarding types of housing or population density are not considered. Future projects may wish to investigate adjusting to these factors for a more nuanced insight.

My initial idea for this project was to use data on affordable units of housing available regionally and find proportions of net housing that are affordable, to see where in England there are the most affordable options (both ownership and rent) for housing. This fell short when I realised the affordable housing data is only available as a gross figure (i.e. not taking into account any losses of affordable housing), therefore any data presented would be misleading. This would be a really interesting future idea given the net data is published. There is some “Official Statistics in Development” data only on affordable housing for rent, if interested (see references).

References

Net Additional Dwellings data: gov.uk | Table 118: annual net additional dwellings and components, England and the regions (ODS, 54KB) -
https://www.gov.uk/government/statistical-data-sets/live-tables-on-net-supply-of-housing#live-tables (Updated: 28 November 2024)

Involvement from: Ministry of Housing, Communities and Local Government, Ministry of Housing, Communities & Local Government (2018 to 2021), Department for Levelling Up, Housing and Communities

Population estimates data: ONS | Mid-2023: 2023 local authority boundaries edition of this dataset edition of this dataset (xlsx, 813.1KB) -
https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/estimatesofthepopulationforenglandandwales (Released: 15 July 2024)

Unused data: Links for anyone interested in affordable data sets mentioned:
https://www.gov.uk/government/statistical-data-sets/live-tables-on-affordable-housing-supply
https://assets.publishing.service.gov.uk/media/65c0b5dec43191000d1a451f/Net_Affordable_Housing_for_Rent.ods/preview